Improving prediction accuracy of drug activities by utilising unlabelled instances with feature selection
نویسندگان
چکیده
Molecular activities can be predicted by Quantitative Structure Activity Relationship (QSAR). Because of the high cost of experiments, the number of drug molecules with known activity is much less than that of unknown, to predict molecular activities utilising unlabelled instances will be an interesting issue. Here, Semi-Supervised Learning (SSL) is introduced and a SSL method, Co-Training is investigated on predicting drug activities utilising unlabelled instances. At the same time, a novel algorithm called FESCOT is proposed, which applies feature selection to remove redundant and irrelevant features for Co-Training. Numerical experimental results show that Co-Training and feature selection helps to improve the prediction ability of Co-Training.
منابع مشابه
Evaluation of Classifiers in Software Fault-Proneness Prediction
Reliability of software counts on its fault-prone modules. This means that the less software consists of fault-prone units the more we may trust it. Therefore, if we are able to predict the number of fault-prone modules of software, it will be possible to judge the software reliability. In predicting software fault-prone modules, one of the contributing features is software metric by which one ...
متن کاملImproving Chernoff criterion for classification by using the filled function
Linear discriminant analysis is a well-known matrix-based dimensionality reduction method. It is a supervised feature extraction method used in two-class classification problems. However, it is incapable of dealing with data in which classes have unequal covariance matrices. Taking this issue, the Chernoff distance is an appropriate criterion to measure distances between distributions. In the p...
متن کاملA Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection
K nearest neighbor algorithm is one of the most frequently used techniques in data mining for its integrity and performance. Though the KNN algorithm is highly effective in many cases, it has some essential deficiencies, which affects the classification accuracy of the algorithm. First, the effectiveness of the algorithm is affected by redundant and irrelevant features. Furthermore, this algori...
متن کاملIFSB-ReliefF: A New Instance and Feature Selection Algorithm Based on ReliefF
Increasing the use of Internet and some phenomena such as sensor networks has led to an unnecessary increasing the volume of information. Though it has many benefits, it causes problems such as storage space requirements and better processors, as well as data refinement to remove unnecessary data. Data reduction methods provide ways to select useful data from a large amount of duplicate, incomp...
متن کاملUnbalance Quantitative Structure Activity Relationship Problem Reduction in Drug Design
Problem statement: Activities of drug molecules can be predicted by Quantitative Structure Activity Relationship (QSAR) models, which overcome the disadvantage of high cost and long cycle by employing traditional experimental methods. With the fact that number of drug molecules with positive activity is rather fewer than that with negatives, it is important to predict molecular activities consi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- International journal of computational biology and drug design
دوره 1 1 شماره
صفحات -
تاریخ انتشار 2008